Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Classification method for imbalance dataset based on genetic algorithm improved synthetic minority over-sampling technique
HUO Yudan, GU Qiong, CAI Zhihua, YUAN Lei
Journal of Computer Applications    2015, 35 (1): 121-124.   DOI: 10.11772/j.issn.1001-9081.2015.01.0121
Abstract704)      PDF (735KB)(711)       Save

When the Synthetic Minority Over-sampling Technique (SMOTE) is used in imbalance dataset classification, it sets the same sampling rate for all the samples of minority class in the process of synthetising new samples, which has blindness. To overcome this problem, a Genetic Algorithm (GA) improved SMOTE algorithm, namely GASMOTE (Genetic Algorithm Improved Synthetic Minority Over-sampling Technique) was proposed. At the beginning, GASMOTE set different sampling rates for different minority class samples. One combination of the sampling rates corresponded to one individual in the population. And then, the selection, crossover and mutation operators of GA were iteratively applied on the population to get the best combination of sampling rates when the stopping criteria were met. At last, the best combination of sampling rates was used in SMOTE to synthetise new samples. The experimental results on ten typical imbalance datasets show that, compared with SMOTE algorithm, GASMOTE can increase 5.9 percentage on F-measure value and 1.6 percentage on G-mean value, and compared with Borderline-SMOTE algorithm, GASMOTE can increase 3.7 percentage on F-measure value and 2.3 percentage on G-mean value. GASMOTE can be used as a new over-sampling technique to deal with imbalance dataset classification problem.

Reference | Related Articles | Metrics